Three empirical studies on the agreement of reviewers about the quality of software engineering experiments

نویسندگان

  • Barbara A. Kitchenham
  • Dag I. K. Sjøberg
  • Tore Dybå
  • Dietmar Pfahl
  • Pearl Brereton
  • David Budgen
  • Martin Höst
  • Per Runeson
چکیده

Context: During systematic literature reviews it is necessary to assess the quality of empirical papers. Current guidelines suggest that two researchers should independently apply a quality checklist and any disagreements must be resolved. However, there is little empirical evidence concerning the effectiveness of these guidelines. Aims: This paper investigates the three techniques that can be used to improve the reliability (i.e. the consensus among reviewers) of quality assessments, specifically, the number of reviewers, the use of a set of evaluation criteria and consultation among reviewers. We undertook a series of studies to investigate these factors. Method: Two studies involved four research papers and eight reviewers using a quality checklist with nine questions. The first study was based on individual assessments, the second study on joint assessments with a period of inter-rater discussion. A third more formal randomised block experiment involved 48 reviewers assessing two of the papers used previously in teams of one, two and three persons to assess the impact of discussion among teams of different size using the evaluations of the ‘‘teams’’ of one person as a control. Results: For the first two studies, the inter-rater reliability was poor for individual assessments, but better for joint evaluations. However, the results of the third study contradicted the results of Study 2. Interrater reliability was poor for all groups but worse for teams of two or three than for individuals. Conclusions: When performing quality assessments for systematic literature reviews, we recommend using three independent reviewers and adopting the median assessment. A quality checklist seems useful but it is difficult to ensure that the checklist is both appropriate and understood by reviewers. Furthermore, future experiments should ensure participants are given more time to understand the quality checklist and to evaluate the research papers. ! 2011 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experimental Study for Protection of Piers Against Local Scour Using Slots

The most important causes of bridge failure are local scour. In this study, laboratory experiments were conducted to investigate the effectiveness of slot as a protection device in reduction of depth of scour at cylindrical piers under clear water flow conditions. The development time of scour depth at the circular pier with and without a slot as a protection device was conducted. The experimen...

متن کامل

Development of a Low Cost and Safe PIV for Mean Flow Velocity and Reynolds Stress Measurements

In this study, a white light particle image velocimetry (WL PIV) system which employs a light sheet generated with a flash was used. The system was developed in order to provide a cost-efficient and safe alternative to laser systems while keeping the accuracy limits required for hydraulic model tests. To investigate the accuracy of WL PIV method under different flow conditions, experiments were...

متن کامل

A 3D Numerical and Empirical Study on the Effects of Injection Pressure and Temperature on the Quality of Produced Mold

Plastic injection is a method in which, with using an extruder in granules, plastic is injected in a hole with high pressure. Because of meeting the two flow fronts in this process welding line will be made. Along the welding line the strength of produced part is low; therefore the position of welding line and its clarity are very important. In this paper, analyzes have been done with Fluent an...

متن کامل

Presenting an Empirical Correlation for Maximum Sauter Mean Diameter in a Spray Extraction Column

Based on the importance of drops' behavior in liquid-liquid extraction, the maximum sauter mean drop diameter has been investigated and correlated in a counter-current spray extraction column with two chemical systems. Spargers were set of nozzles in all experiments. Studying the effects of several parameters on drops size, some correlations were estimated by the last available version of softw...

متن کامل

Failure Probability and Remaining Life Assessment of Reheater Tubes

In this study, a real and significant industrial problem in a steam power plant was investigated. Reheater tubes in boilers are under the creep and the fireside corrosion mechanism that cause some of them to fail. Since the estimation of probability of failure (PoF) and remaining life (RL) is expensive and time consuming in the deterministic methods, in this work they were evaluated using struc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Information & Software Technology

دوره 54  شماره 

صفحات  -

تاریخ انتشار 2012